A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. Star Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
#import sys
#!{sys.executable} -m pip install pandas-profiling
#Libraries to help with reading data and manipulating data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Libraries that support data visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Libraries to suppress warnings
import warnings
warnings.filterwarnings("ignore")
sns.set()
# to split the data into train and test
from sklearn.model_selection import train_test_split
# to build logistic regression_model
from sklearn.linear_model import LogisticRegression
# to check model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# to build linear regression_model using statsmodels
import statsmodels.api as sm
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
plot_confusion_matrix,
precision_recall_curve,
roc_curve,
make_scorer,
)
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import sklearn.metrics as metrics
# To tune different models
from sklearn.model_selection import GridSearchCV
# Libraries to suppress warnings
import warnings
warnings.filterwarnings("ignore")
#Load the dataset
data = pd.read_csv("StarHotelsGroup.csv")
#copying data to another data frame to avaoid changes in the original data
data_new = data.copy()
data_new.head() # display 5 rows from the dataframe
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | Canceled |
data_new.shape
(56926, 18)
# Selecting duplicate rows except first
# occurrence based on all columns
duplicate = data_new[data_new.duplicated()]
duplicate
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 29 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 99 | 2017 | 10 | 30 | Online | 0 | 0 | 0 | 65.00 | 0 | Canceled |
| 241 | 2 | 0 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 55 | 2018 | 4 | 6 | Offline | 0 | 0 | 0 | 104.00 | 0 | Not_Canceled |
| 252 | 2 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 282 | 2019 | 7 | 5 | Online | 0 | 0 | 0 | 141.30 | 1 | Canceled |
| 417 | 2 | 0 | 1 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 161 | 2018 | 3 | 25 | Online | 0 | 0 | 0 | 130.00 | 0 | Canceled |
| 457 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 188 | 2018 | 6 | 15 | Online | 0 | 0 | 0 | 130.00 | 0 | Canceled |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 56905 | 1 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 245 | 2018 | 7 | 6 | Offline | 0 | 0 | 0 | 110.00 | 0 | Canceled |
| 56907 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 116 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 1.00 | 0 | Not_Canceled |
| 56912 | 2 | 0 | 1 | 0 | Not Selected | 0 | Room_Type 1 | 49 | 2018 | 7 | 11 | Online | 0 | 0 | 0 | 93.15 | 0 | Canceled |
| 56913 | 1 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 166 | 2018 | 11 | 1 | Offline | 0 | 0 | 0 | 110.00 | 0 | Canceled |
| 56925 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | Not_Canceled |
14350 rows × 18 columns
# Dropping all the duplicates in the dataframe.
df = data_new.drop_duplicates()
df
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | Canceled |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 56920 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | Not_Canceled |
| 56921 | 2 | 1 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 4 | 45 | 2019 | 6 | 15 | Online | 0 | 0 | 0 | 163.88 | 1 | Not_Canceled |
| 56922 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 320 | 2019 | 5 | 15 | Offline | 0 | 0 | 0 | 90.00 | 1 | Canceled |
| 56923 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 56924 | 2 | 0 | 2 | 2 | Not Selected | 0 | Room_Type 1 | 6 | 2019 | 4 | 28 | Online | 0 | 0 | 0 | 162.50 | 2 | Not_Canceled |
42576 rows × 18 columns
# Selecting duplicate rows except first
# occurrence based on all columns
duplicate_new = df[df.duplicated()]
duplicate_new
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status |
|---|
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 42576 entries, 0 to 56924 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 42576 non-null int64 1 no_of_children 42576 non-null int64 2 no_of_weekend_nights 42576 non-null int64 3 no_of_week_nights 42576 non-null int64 4 type_of_meal_plan 42576 non-null object 5 required_car_parking_space 42576 non-null int64 6 room_type_reserved 42576 non-null object 7 lead_time 42576 non-null int64 8 arrival_year 42576 non-null int64 9 arrival_month 42576 non-null int64 10 arrival_date 42576 non-null int64 11 market_segment_type 42576 non-null object 12 repeated_guest 42576 non-null int64 13 no_of_previous_cancellations 42576 non-null int64 14 no_of_previous_bookings_not_canceled 42576 non-null int64 15 avg_price_per_room 42576 non-null float64 16 no_of_special_requests 42576 non-null int64 17 booking_status 42576 non-null object dtypes: float64(1), int64(13), object(4) memory usage: 6.2+ MB
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 42576.0 | 1.916737 | 0.527524 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 |
| no_of_children | 42576.0 | 0.142146 | 0.459920 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 |
| no_of_weekend_nights | 42576.0 | 0.895270 | 0.887864 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 |
| no_of_week_nights | 42576.0 | 2.321167 | 1.519328 | 0.0 | 1.0 | 2.0 | 3.0 | 17.0 |
| required_car_parking_space | 42576.0 | 0.034362 | 0.182160 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| lead_time | 42576.0 | 77.315953 | 77.279616 | 0.0 | 16.0 | 53.0 | 118.0 | 521.0 |
| arrival_year | 42576.0 | 2018.297891 | 0.626126 | 2017.0 | 2018.0 | 2018.0 | 2019.0 | 2019.0 |
| arrival_month | 42576.0 | 6.365488 | 3.051924 | 1.0 | 4.0 | 6.0 | 9.0 | 12.0 |
| arrival_date | 42576.0 | 15.682873 | 8.813991 | 1.0 | 8.0 | 16.0 | 23.0 | 31.0 |
| repeated_guest | 42576.0 | 0.030886 | 0.173011 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| no_of_previous_cancellations | 42576.0 | 0.025413 | 0.358194 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 42576.0 | 0.222731 | 2.242308 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 |
| avg_price_per_room | 42576.0 | 112.375800 | 40.865896 | 0.0 | 85.5 | 107.0 | 135.0 | 540.0 |
| no_of_special_requests | 42576.0 | 0.768109 | 0.837264 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 |
By default the describe() function shows only the summary of numeric variables only. Let's check the summary of non-numeric variables.
df.describe(exclude='number').T
| count | unique | top | freq | |
|---|---|---|---|---|
| type_of_meal_plan | 42576 | 4 | Meal Plan 1 | 31863 |
| room_type_reserved | 42576 | 7 | Room_Type 1 | 29730 |
| market_segment_type | 42576 | 5 | Online | 34169 |
| booking_status | 42576 | 2 | Not_Canceled | 28089 |
category = ['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status']
for column in category:
print(df[column].value_counts())
print('_'*40)
Meal Plan 1 31863 Not Selected 8716 Meal Plan 2 1989 Meal Plan 3 8 Name: type_of_meal_plan, dtype: int64 ________________________________________ Room_Type 1 29730 Room_Type 4 9369 Room_Type 6 1540 Room_Type 5 906 Room_Type 2 718 Room_Type 7 307 Room_Type 3 6 Name: room_type_reserved, dtype: int64 ________________________________________ Online 34169 Offline 5777 Corporate 1939 Complementary 496 Aviation 195 Name: market_segment_type, dtype: int64 ________________________________________ Not_Canceled 28089 Canceled 14487 Name: booking_status, dtype: int64 ________________________________________
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 42576 entries, 0 to 56924 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 42576 non-null int64 1 no_of_children 42576 non-null int64 2 no_of_weekend_nights 42576 non-null int64 3 no_of_week_nights 42576 non-null int64 4 type_of_meal_plan 42576 non-null object 5 required_car_parking_space 42576 non-null int64 6 room_type_reserved 42576 non-null object 7 lead_time 42576 non-null int64 8 arrival_year 42576 non-null int64 9 arrival_month 42576 non-null int64 10 arrival_date 42576 non-null int64 11 market_segment_type 42576 non-null object 12 repeated_guest 42576 non-null int64 13 no_of_previous_cancellations 42576 non-null int64 14 no_of_previous_bookings_not_canceled 42576 non-null int64 15 avg_price_per_room 42576 non-null float64 16 no_of_special_requests 42576 non-null int64 17 booking_status 42576 non-null object dtypes: float64(1), int64(13), object(4) memory usage: 6.2+ MB
print(df.type_of_meal_plan.unique())
['Meal Plan 1' 'Not Selected' 'Meal Plan 2' 'Meal Plan 3']
print(df.room_type_reserved.unique())
['Room_Type 1' 'Room_Type 4' 'Room_Type 6' 'Room_Type 5' 'Room_Type 2' 'Room_Type 7' 'Room_Type 3']
print(df.market_segment_type.unique())
['Offline' 'Online' 'Corporate' 'Aviation' 'Complementary']
print(df.booking_status.unique())
['Not_Canceled' 'Canceled']
df.describe(include='all').T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| no_of_adults | 42576.0 | NaN | NaN | NaN | 1.916737 | 0.527524 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 |
| no_of_children | 42576.0 | NaN | NaN | NaN | 0.142146 | 0.45992 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 |
| no_of_weekend_nights | 42576.0 | NaN | NaN | NaN | 0.89527 | 0.887864 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 |
| no_of_week_nights | 42576.0 | NaN | NaN | NaN | 2.321167 | 1.519328 | 0.0 | 1.0 | 2.0 | 3.0 | 17.0 |
| type_of_meal_plan | 42576 | 4 | Meal Plan 1 | 31863 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| required_car_parking_space | 42576.0 | NaN | NaN | NaN | 0.034362 | 0.18216 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| room_type_reserved | 42576 | 7 | Room_Type 1 | 29730 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| lead_time | 42576.0 | NaN | NaN | NaN | 77.315953 | 77.279616 | 0.0 | 16.0 | 53.0 | 118.0 | 521.0 |
| arrival_year | 42576.0 | NaN | NaN | NaN | 2018.297891 | 0.626126 | 2017.0 | 2018.0 | 2018.0 | 2019.0 | 2019.0 |
| arrival_month | 42576.0 | NaN | NaN | NaN | 6.365488 | 3.051924 | 1.0 | 4.0 | 6.0 | 9.0 | 12.0 |
| arrival_date | 42576.0 | NaN | NaN | NaN | 15.682873 | 8.813991 | 1.0 | 8.0 | 16.0 | 23.0 | 31.0 |
| market_segment_type | 42576 | 5 | Online | 34169 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| repeated_guest | 42576.0 | NaN | NaN | NaN | 0.030886 | 0.173011 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| no_of_previous_cancellations | 42576.0 | NaN | NaN | NaN | 0.025413 | 0.358194 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 42576.0 | NaN | NaN | NaN | 0.222731 | 2.242308 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 |
| avg_price_per_room | 42576.0 | NaN | NaN | NaN | 112.3758 | 40.865896 | 0.0 | 85.5 | 107.0 | 135.0 | 540.0 |
| no_of_special_requests | 42576.0 | NaN | NaN | NaN | 0.768109 | 0.837264 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 |
| booking_status | 42576 | 2 | Not_Canceled | 28089 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df.isna().sum()
no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
df.isnull().sum()
no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status']
['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status']
df.type_of_meal_plan = df.type_of_meal_plan.astype('category')
df.room_type_reserved = df.room_type_reserved.astype('category')
df.market_segment_type = df.market_segment_type.astype('category')
df.booking_status = df.booking_status.astype('category')
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 42576 entries, 0 to 56924 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 42576 non-null int64 1 no_of_children 42576 non-null int64 2 no_of_weekend_nights 42576 non-null int64 3 no_of_week_nights 42576 non-null int64 4 type_of_meal_plan 42576 non-null category 5 required_car_parking_space 42576 non-null int64 6 room_type_reserved 42576 non-null category 7 lead_time 42576 non-null int64 8 arrival_year 42576 non-null int64 9 arrival_month 42576 non-null int64 10 arrival_date 42576 non-null int64 11 market_segment_type 42576 non-null category 12 repeated_guest 42576 non-null int64 13 no_of_previous_cancellations 42576 non-null int64 14 no_of_previous_bookings_not_canceled 42576 non-null int64 15 avg_price_per_room 42576 non-null float64 16 no_of_special_requests 42576 non-null int64 17 booking_status 42576 non-null category dtypes: category(4), float64(1), int64(13) memory usage: 5.0 MB
The object variables are converted to categorical variables.
Questions:
sns.histplot(df.no_of_adults, kde=True);
sns.boxplot(df.no_of_adults,orient = "h");
sns.histplot(df.no_of_children, kde=True);
sns.boxplot(df.no_of_children,orient = "h");
sns.histplot(df.no_of_weekend_nights, kde=True);
sns.boxplot(df.no_of_weekend_nights,orient = "h");
sns.histplot(df.required_car_parking_space, kde=True);
sns.histplot(df.lead_time)
<AxesSubplot:xlabel='lead_time', ylabel='Count'>
sns.boxplot(df.lead_time,orient = "h");
sns.histplot(df.arrival_month)
<AxesSubplot:xlabel='arrival_month', ylabel='Count'>
sns.boxplot(df.arrival_month,orient = "h");
sns.histplot(df.repeated_guest)
<AxesSubplot:xlabel='repeated_guest', ylabel='Count'>
sns.histplot(df.no_of_previous_cancellations)
<AxesSubplot:xlabel='no_of_previous_cancellations', ylabel='Count'>
Most of the customers didnot cancel the booking.
sns.histplot(df.no_of_previous_bookings_not_canceled)
<AxesSubplot:xlabel='no_of_previous_bookings_not_canceled', ylabel='Count'>
no_of_previous_bookings_not_calceled = 0 in most of the cases
sns.histplot(df.no_of_special_requests)
<AxesSubplot:xlabel='no_of_special_requests', ylabel='Count'>
sns.histplot(df.type_of_meal_plan)
<AxesSubplot:xlabel='type_of_meal_plan', ylabel='Count'>
sns.countplot(df.room_type_reserved)
<AxesSubplot:xlabel='room_type_reserved', ylabel='count'>
sns.countplot(df.market_segment_type)
<AxesSubplot:xlabel='market_segment_type', ylabel='count'>
sns.countplot(df.booking_status)
<AxesSubplot:xlabel='booking_status', ylabel='count'>
df["booking_status"].replace({"Canceled": 1, "Not_Canceled": 0}, inplace=True)
column_list = df.select_dtypes(include=np.number).columns.tolist()
# dropping release_year as it is a temporal variable
plt.figure(figsize=(15, 7))
sns.heatmap(
df[column_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral"
)
plt.show()
sns.pairplot(df);
sns.countplot(x="type_of_meal_plan", hue="room_type_reserved", data=df, palette='Set1',saturation=50 );
sns.countplot(x="type_of_meal_plan", hue="market_segment_type", data=df, palette='Set1',saturation=50 );
sns.countplot(x="type_of_meal_plan", hue="booking_status", data=df, palette='Set1',saturation=50 );
sns.countplot(x="room_type_reserved", hue="market_segment_type", data=df, palette='Set1',saturation=50 );
sns.countplot(x="market_segment_type", hue="booking_status", data=df, palette='Set1',saturation=50 );
sns.countplot(x="room_type_reserved", hue="booking_status", data=df, palette='Set1',saturation=50 );
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_adults', y = df['avg_price_per_room'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.lineplot(x ='no_of_children', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_weekend_nights', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_week_nights', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='required_car_parking_space', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.lineplot(x ='lead_time', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='arrival_year', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='arrival_month', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_previous_cancellations', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.scatterplot(x ='no_of_previous_bookings_not_canceled', y = df['booking_status'], data = df)
booking_statusfig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='no_of_special_requests', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='booking_status', y = df['avg_price_per_room'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='type_of_meal_plan', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='market_segment_type', y = df['booking_status'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='room_type_reserved', y = df['booking_status'], data = df)
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="avg_price_per_room", data=df, hue='room_type_reserved', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="avg_price_per_room", data=df, hue='market_segment_type', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="avg_price_per_room", data=df, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="room_type_reserved", y="avg_price_per_room", data=df, hue='market_segment_type', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="room_type_reserved", y="avg_price_per_room", data=df, hue='market_segment_type', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="room_type_reserved", y="avg_price_per_room", data=df, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="room_type_reserved", y="avg_price_per_room", data=df, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="market_segment_type", y="avg_price_per_room", data=df, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="market_segment_type", y="avg_price_per_room", data=df, hue='booking_status', palette='tab10' )
plt.show()
1.Booking_status = Not_canceled is lesser than Booking_status = canceled for all kinds of market_Segmentation
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="market_segment_type", y="no_of_previous_cancellations", data=df, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="type_of_meal_plan", y="no_of_previous_bookings_not_canceled", data=df, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="no_of_previous_cancellations", data=df, hue='market_segment_type', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="no_of_special_requests", data=df, hue='room_type_reserved', palette='tab10' )
plt.show()
ax = sns.violinplot(x =data.room_type_reserved, y = data['avg_price_per_room'])
ax = sns.violinplot(x =data.type_of_meal_plan, y = data['avg_price_per_room'])
sns.histplot(df.arrival_month)
<AxesSubplot:xlabel='arrival_month', ylabel='Count'>
sns.boxplot(df.arrival_month,orient = "h");
sns.countplot(df.market_segment_type)
<AxesSubplot:xlabel='market_segment_type', ylabel='count'>
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='market_segment_type', y = df['avg_price_per_room'], data = df)
def labeled_barplot(data, feature, perc=False, n=None):
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
)
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
)
plt.show()
labeled_barplot(df, "booking_status", perc=True)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='booking_status', y = df['repeated_guest'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='no_of_previous_cancellations', y = df['repeated_guest'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='booking_status', y = df['no_of_special_requests'], data = df)
# let's create a copy of the data
df1 = df.copy()
np.random.seed(42)
df1.sample(n=10)#Return a random sample of 10 rows from the dataframe 'data'
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18469 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 206 | 2018 | 8 | 29 | Online | 0 | 0 | 0 | 90.95 | 0 | 1 |
| 28961 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 44 | 2019 | 6 | 14 | Online | 0 | 0 | 0 | 150.00 | 2 | 0 |
| 10871 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2018 | 7 | 9 | Online | 0 | 0 | 0 | 116.27 | 1 | 0 |
| 6269 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 6 | 2018 | 9 | 16 | Online | 0 | 0 | 0 | 149.00 | 1 | 0 |
| 47504 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 21 | 2017 | 9 | 3 | Online | 0 | 0 | 0 | 105.00 | 2 | 0 |
| 48838 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 59 | 2019 | 3 | 20 | Online | 0 | 0 | 0 | 106.20 | 2 | 0 |
| 49094 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 10 | 2019 | 2 | 22 | Online | 0 | 0 | 0 | 118.00 | 0 | 0 |
| 8494 | 2 | 1 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 75 | 2018 | 10 | 20 | Online | 0 | 0 | 0 | 135.00 | 0 | 1 |
| 41272 | 2 | 0 | 1 | 3 | Not Selected | 0 | Room_Type 1 | 95 | 2019 | 5 | 8 | Online | 0 | 0 | 0 | 108.00 | 0 | 1 |
| 29451 | 2 | 0 | 1 | 2 | Not Selected | 0 | Room_Type 1 | 0 | 2017 | 8 | 31 | Online | 0 | 0 | 0 | 96.33 | 1 | 0 |
df1.isnull().sum().sort_values(ascending = False)
#Return the count of missing values column-wise and sort them in descending order
no_of_adults 0 no_of_children 0 no_of_special_requests 0 avg_price_per_room 0 no_of_previous_bookings_not_canceled 0 no_of_previous_cancellations 0 repeated_guest 0 market_segment_type 0 arrival_date 0 arrival_month 0 arrival_year 0 lead_time 0 room_type_reserved 0 required_car_parking_space 0 type_of_meal_plan 0 no_of_week_nights 0 no_of_weekend_nights 0 booking_status 0 dtype: int64
df1
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | 0 |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | 0 |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | 1 |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 56920 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | 0 |
| 56921 | 2 | 1 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 4 | 45 | 2019 | 6 | 15 | Online | 0 | 0 | 0 | 163.88 | 1 | 0 |
| 56922 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 320 | 2019 | 5 | 15 | Offline | 0 | 0 | 0 | 90.00 | 1 | 1 |
| 56923 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | 1 |
| 56924 | 2 | 0 | 2 | 2 | Not Selected | 0 | Room_Type 1 | 6 | 2019 | 4 | 28 | Online | 0 | 0 | 0 | 162.50 | 2 | 0 |
42576 rows × 18 columns
# Lets look at the statistical summary of the data
df1.describe(include="all").T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| no_of_adults | 42576.0 | NaN | NaN | NaN | 1.916737 | 0.527524 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 |
| no_of_children | 42576.0 | NaN | NaN | NaN | 0.142146 | 0.45992 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 |
| no_of_weekend_nights | 42576.0 | NaN | NaN | NaN | 0.89527 | 0.887864 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 |
| no_of_week_nights | 42576.0 | NaN | NaN | NaN | 2.321167 | 1.519328 | 0.0 | 1.0 | 2.0 | 3.0 | 17.0 |
| type_of_meal_plan | 42576 | 4 | Meal Plan 1 | 31863 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| required_car_parking_space | 42576.0 | NaN | NaN | NaN | 0.034362 | 0.18216 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| room_type_reserved | 42576 | 7 | Room_Type 1 | 29730 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| lead_time | 42576.0 | NaN | NaN | NaN | 77.315953 | 77.279616 | 0.0 | 16.0 | 53.0 | 118.0 | 521.0 |
| arrival_year | 42576.0 | NaN | NaN | NaN | 2018.297891 | 0.626126 | 2017.0 | 2018.0 | 2018.0 | 2019.0 | 2019.0 |
| arrival_month | 42576.0 | NaN | NaN | NaN | 6.365488 | 3.051924 | 1.0 | 4.0 | 6.0 | 9.0 | 12.0 |
| arrival_date | 42576.0 | NaN | NaN | NaN | 15.682873 | 8.813991 | 1.0 | 8.0 | 16.0 | 23.0 | 31.0 |
| market_segment_type | 42576 | 5 | Online | 34169 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| repeated_guest | 42576.0 | NaN | NaN | NaN | 0.030886 | 0.173011 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| no_of_previous_cancellations | 42576.0 | NaN | NaN | NaN | 0.025413 | 0.358194 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 42576.0 | NaN | NaN | NaN | 0.222731 | 2.242308 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 |
| avg_price_per_room | 42576.0 | NaN | NaN | NaN | 112.3758 | 40.865896 | 0.0 | 85.5 | 107.0 | 135.0 | 540.0 |
| no_of_special_requests | 42576.0 | NaN | NaN | NaN | 0.768109 | 0.837264 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 |
| booking_status | 42576.0 | NaN | NaN | NaN | 0.340262 | 0.473803 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
df1['type_of_meal_plan'].value_counts()
Meal Plan 1 31863 Not Selected 8716 Meal Plan 2 1989 Meal Plan 3 8 Name: type_of_meal_plan, dtype: int64
df1['room_type_reserved'].value_counts()
Room_Type 1 29730 Room_Type 4 9369 Room_Type 6 1540 Room_Type 5 906 Room_Type 2 718 Room_Type 7 307 Room_Type 3 6 Name: room_type_reserved, dtype: int64
df1['market_segment_type'].value_counts()
Online 34169 Offline 5777 Corporate 1939 Complementary 496 Aviation 195 Name: market_segment_type, dtype: int64
df1['booking_status'].value_counts()
0 28089 1 14487 Name: booking_status, dtype: int64
# outlier detection using boxplot
numeric_columns = df1.select_dtypes(include=np.number).columns.tolist()
#let's plot the boxplots of all columns to check for outliers
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numeric_columns):
plt.subplot(5, 4, i + 1)
plt.boxplot(df1[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
# let's check the statistical summary of the data once
df1.describe(include="all").T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| no_of_adults | 42576.0 | NaN | NaN | NaN | 1.916737 | 0.527524 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 |
| no_of_children | 42576.0 | NaN | NaN | NaN | 0.142146 | 0.45992 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 |
| no_of_weekend_nights | 42576.0 | NaN | NaN | NaN | 0.89527 | 0.887864 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 |
| no_of_week_nights | 42576.0 | NaN | NaN | NaN | 2.321167 | 1.519328 | 0.0 | 1.0 | 2.0 | 3.0 | 17.0 |
| type_of_meal_plan | 42576 | 4 | Meal Plan 1 | 31863 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| required_car_parking_space | 42576.0 | NaN | NaN | NaN | 0.034362 | 0.18216 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| room_type_reserved | 42576 | 7 | Room_Type 1 | 29730 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| lead_time | 42576.0 | NaN | NaN | NaN | 77.315953 | 77.279616 | 0.0 | 16.0 | 53.0 | 118.0 | 521.0 |
| arrival_year | 42576.0 | NaN | NaN | NaN | 2018.297891 | 0.626126 | 2017.0 | 2018.0 | 2018.0 | 2019.0 | 2019.0 |
| arrival_month | 42576.0 | NaN | NaN | NaN | 6.365488 | 3.051924 | 1.0 | 4.0 | 6.0 | 9.0 | 12.0 |
| arrival_date | 42576.0 | NaN | NaN | NaN | 15.682873 | 8.813991 | 1.0 | 8.0 | 16.0 | 23.0 | 31.0 |
| market_segment_type | 42576 | 5 | Online | 34169 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| repeated_guest | 42576.0 | NaN | NaN | NaN | 0.030886 | 0.173011 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| no_of_previous_cancellations | 42576.0 | NaN | NaN | NaN | 0.025413 | 0.358194 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 42576.0 | NaN | NaN | NaN | 0.222731 | 2.242308 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 |
| avg_price_per_room | 42576.0 | NaN | NaN | NaN | 112.3758 | 40.865896 | 0.0 | 85.5 | 107.0 | 135.0 | 540.0 |
| no_of_special_requests | 42576.0 | NaN | NaN | NaN | 0.768109 | 0.837264 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 |
| booking_status | 42576.0 | NaN | NaN | NaN | 0.340262 | 0.473803 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
data_new.head() # display 5 rows from the dataframe
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | Canceled |
df1.shape
(42576, 18)
sns.histplot(df1.no_of_adults, kde=True);
sns.boxplot(df1.no_of_adults,orient = "h");
no_of_adults = 2 in most of the cases.
sns.histplot(df1.no_of_children, kde=True);
sns.boxplot(df1.no_of_children,orient = "h");
In most of the cases no_of_children = 0. The distribution is positively skewed. There are 5 outliers
sns.histplot(df1.no_of_weekend_nights, kde=True);
sns.boxplot(df1.no_of_weekend_nights,orient = "h");
There are 3 outliers.
sns.histplot(df1.required_car_parking_space, kde=True);
sns.histplot(df1.lead_time)
<AxesSubplot:xlabel='lead_time', ylabel='Count'>
sns.boxplot(df1.lead_time,orient = "h");
The distribution is positively skewed. There are many outliers.
sns.histplot(df1.arrival_month)
<AxesSubplot:xlabel='arrival_month', ylabel='Count'>
Most of the booking are in August month. The least number of bookings are in January and November
sns.boxplot(df1.arrival_month,orient = "h");
sns.histplot(df1.repeated_guest)
<AxesSubplot:xlabel='repeated_guest', ylabel='Count'>
Most of the customers are repeated guest.
sns.histplot(df1.no_of_previous_cancellations)
<AxesSubplot:xlabel='no_of_previous_cancellations', ylabel='Count'>
Most of the customers didnot cancel the booking.
sns.histplot(df1.no_of_previous_bookings_not_canceled)
<AxesSubplot:xlabel='no_of_previous_bookings_not_canceled', ylabel='Count'>
no_of_previous_bookings_not_calceled = 0 in most of the cases
sns.histplot(df1.no_of_special_requests)
<AxesSubplot:xlabel='no_of_special_requests', ylabel='Count'>
The number of special guest range between 0 and 3
sns.histplot(df1.type_of_meal_plan)
<AxesSubplot:xlabel='type_of_meal_plan', ylabel='Count'>
Maximum number of the customers prefer Meal Plan 1 There are some customers who dont have any preference.
sns.countplot(df1.room_type_reserved)
<AxesSubplot:xlabel='room_type_reserved', ylabel='count'>
Most of the customers prefer Room Type 1 Least preferred room is Room Type 3 followed by Room Type 7.
sns.countplot(df1.market_segment_type)
<AxesSubplot:xlabel='market_segment_type', ylabel='count'>
Most of the customers prefer 'online' market_Segmentation_type. Least preferred is Aviation
sns.countplot(df1.booking_status)
<AxesSubplot:xlabel='booking_status', ylabel='count'>
Most of the customers have not canceled their booking.
sns.countplot(x="type_of_meal_plan", hue="room_type_reserved", data=df1, palette='Set1',saturation=50 );
Room_type 1 is mostly preferred followed by Room_type 4 Customers who prefer Meal Plan 1 mostly use Room_type 1
sns.countplot(x="type_of_meal_plan", hue="market_segment_type", data=df1, palette='Set1',saturation=50 );
sns.countplot(x="type_of_meal_plan", hue="booking_status", data=df1, palette='Set1',saturation=50 );
sns.countplot(x="room_type_reserved", hue="market_segment_type", data=df1, palette='Set1',saturation=50 );
Customers prefer Online market segment type. Most of the customer choose Room Type 1 and online market segment type.
sns.countplot(x="market_segment_type", hue="booking_status", data=df1, palette='Set1',saturation=50 );
It is observed that customers who prefer Online market segmentation had more cancellations than others. There are more customers with booking_status = not_canceled.
sns.countplot(x="room_type_reserved", hue="booking_status", data=df1, palette='Set1',saturation=50 );
Customers who prefer Room_type 1 had more cancellations than other. There are more customers with booking_status = not_canceled.
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_adults', y = df1['avg_price_per_room'], data = df1)
The more the number of adults, higher the average price of the room. The price is highest when the no_of_adults = 2
fig = plt.figure(figsize= (10,5))
ax = sns.lineplot(x ='no_of_children', y = df1['booking_status'], data = df1)
The chances of booking getting canceled is lesser when there are more children
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_weekend_nights', y = df1['booking_status'], data = df1)
The chances of cancelations increases with the no_of_weekend_nights
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_week_nights', y = df1['booking_status'], data = df1)
The cancellations is more when the no_of_week_nights choosen are more.
booking_statusfig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='required_car_parking_space', y = df1['booking_status'], data = df1)
Chances of room getting canceled is more if the required_car_parking_space = 0
fig = plt.figure(figsize= (10,5))
ax = sns.lineplot(x ='lead_time', y = df1['booking_status'], data = df1)
The more the lead_time, the chances of cancelation increases.
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='arrival_year', y = df1['booking_status'], data = df1)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='no_of_special_requests', y = df1['booking_status'], data = df1)
The booking_status = canceled are more when the no_of_special_requests are less
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='booking_status', y = df1['avg_price_per_room'], data = df1)
The avg_price_per_room is high if the customer chooses booking_status = canceled..
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='market_segment_type', y = df1['booking_status'], data = df1)
Max number of cancelations are seen form market_segment_type = Online and least for market_segment_type = Complementary
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='room_type_reserved', y = df1['booking_status'], data = df1)
Max number of cancelations are seen when Room Type 6 is choosen
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="avg_price_per_room", data=df1, hue='booking_status', palette='tab10' )
plt.show()
Those bookings where Mean Plan 3 is choosed has maximum number of cancelations.
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="room_type_reserved", y="avg_price_per_room", data=df1, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="room_type_reserved", y="avg_price_per_room", data=df1, hue='booking_status', palette='tab10' )
plt.show()
For all kinds of room types, there are more cancelations than non_Canceled rooms.
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="market_segment_type", y="avg_price_per_room", data=df1, hue='booking_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.barplot(x="market_segment_type", y="avg_price_per_room", data=df1, hue='booking_status', palette='tab10' )
plt.show()
Booking_status = Not_canceled is lesser than Booking_status = canceled for all kinds of market_Segmentation
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="market_segment_type", y="no_of_previous_cancellations", data=df1, hue='booking_status', palette='tab10' )
plt.show()
no_of_previous_cancellations are more if the booking_status is Not_Canceled and market_segment_type = Complementary and Corporate. no_of_previous_cancellations is least for all the Market Segments if the booking_status is canceled.
figure = plt.figure(figsize=(8,7))
sns.pointplot(x="type_of_meal_plan", y="no_of_previous_bookings_not_canceled", data=df1, hue='booking_status', palette='tab10' )
plt.show()
no_of_previous_bookings_not_canceled is least for booking_status = canceled for all the meal plans. no_of_previous_bookings_not_canceled is most for Meal Plan 3 and booking_status = not_canceled.
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="no_of_previous_cancellations", data=df1, hue='market_segment_type', palette='tab10' )
plt.show()
no_of_previous_cancellations is least for market_segment_type = online and offline for all the meal plans. no_of_previous_cancellations is highest for market_segment_type = Complementart and Corporate for Meal PLan 1
figure = plt.figure(figsize=(8,7))
sns.barplot(x="type_of_meal_plan", y="no_of_special_requests", data=df1, hue='room_type_reserved', palette='tab10' )
plt.show()
There are more special requests for Room Type 7 for all the meal plans. Least Special reuests are for Meal Plan 3.
sns.histplot(df1.arrival_month)
<AxesSubplot:xlabel='arrival_month', ylabel='Count'>
sns.boxplot(df1.arrival_month,orient = "h");
sns.countplot(df1.market_segment_type)
<AxesSubplot:xlabel='market_segment_type', ylabel='count'>
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='market_segment_type', y = df1['avg_price_per_room'], data = df1)
def labeled_barplot(data, feature, perc=False, n=None):
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
)
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
)
plt.show()
labeled_barplot(df1, "booking_status", perc=True)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='booking_status', y = df1['repeated_guest'], data = df1)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='no_of_previous_cancellations', y = df1['repeated_guest'], data = df1)
fig = plt.figure(figsize= (10,5))
ax = sns.barplot(x ='booking_status', y = df1['no_of_special_requests'], data = df1)
The lesser the no_of_special_requests, the lesser the chances of bookings cancellations. The more the no_of_special_requests, the more the chances of bookings not getting canceled.
X = df1.drop(['booking_status'], axis = 1)
y = df1['booking_status']
X.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 |
y.head()
0 0 1 0 2 1 3 1 4 1 Name: booking_status, dtype: int64
X = pd.get_dummies(X, columns = X.select_dtypes(include = ["object","category"]).columns.tolist(), drop_first = True)
X.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | 0 | 224 | 2017 | 10 | 2 | 0 | 0 | 0 | 65.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 0 | 2 | 3 | 0 | 5 | 2018 | 11 | 6 | 0 | 0 | 0 | 106.68 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 1 | 0 | 2 | 1 | 0 | 1 | 2018 | 2 | 28 | 0 | 0 | 0 | 60.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | 0 | 211 | 2018 | 5 | 20 | 0 | 0 | 0 | 100.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 3 | 0 | 0 | 3 | 0 | 277 | 2019 | 7 | 13 | 0 | 0 | 0 | 89.10 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
y.head()
0 0 1 0 2 1 3 1 4 1 Name: booking_status, dtype: int64
# Splitting the data into test and train data
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 1)
x_train_stats = x_train.copy()
y_train_stats = y_train.copy()
x_train_sklearn = x_train.copy()
y_train_sklearn = y_train.copy()
from statsmodels.stats.outliers_influence import variance_inflation_factor
def checking_vif(predictors):
vif = pd.DataFrame()
vif["feature"] = predictors.columns
vif["VIF"] = [variance_inflation_factor(predictors.values,i)
for i in range(len(predictors.columns))
]
return vif
checking_vif(x_train_stats)
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 20.446326 |
| 1 | no_of_children | 2.273281 |
| 2 | no_of_weekend_nights | 2.189851 |
| 3 | no_of_week_nights | 3.759164 |
| 4 | required_car_parking_space | 1.069135 |
| 5 | lead_time | 2.307859 |
| 6 | arrival_year | 247.465491 |
| 7 | arrival_month | 5.601181 |
| 8 | arrival_date | 4.169750 |
| 9 | repeated_guest | 2.074547 |
| 10 | no_of_previous_cancellations | 1.514191 |
| 11 | no_of_previous_bookings_not_canceled | 1.858973 |
| 12 | avg_price_per_room | 19.176685 |
| 13 | no_of_special_requests | 2.054468 |
| 14 | type_of_meal_plan_Meal Plan 2 | 1.141328 |
| 15 | type_of_meal_plan_Meal Plan 3 | 1.027836 |
| 16 | type_of_meal_plan_Not Selected | 1.588402 |
| 17 | room_type_reserved_Room_Type 2 | 1.111350 |
| 18 | room_type_reserved_Room_Type 3 | 1.001296 |
| 19 | room_type_reserved_Room_Type 4 | 1.823107 |
| 20 | room_type_reserved_Room_Type 5 | 1.138882 |
| 21 | room_type_reserved_Room_Type 6 | 2.160496 |
| 22 | room_type_reserved_Room_Type 7 | 1.173938 |
| 23 | market_segment_type_Complementary | 3.983671 |
| 24 | market_segment_type_Corporate | 11.234312 |
| 25 | market_segment_type_Offline | 31.632538 |
| 26 | market_segment_type_Online | 185.295387 |
Both the cases are important.
Recall should be maximized, the greater the Recall higher the chances minimizing false negatives. # defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_statsmodel(
model, predictors, target, threshold=0.5
):
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
accuracy_value = accuracy_score(target, pred) # to compute Accuracy
recall_value = recall_score(target, pred) # to compute Recall
precision_value = precision_score(target, pred) # to compute Precision
f1_value = f1_score(target, pred) # to compute F1-score
df_perf = pd.DataFrame(
{"Accuracy": accuracy_value, "Recall": recall_value, "Precision": precision_value, "F1": f1_value,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_for_statsmodel(model, predictors, target, threshold=0.5):
""" model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# There are different solvers available. The newton-cg solver is faster for higher-dimensional data.
lg = LogisticRegression(solver="newton-cg", random_state=1)
model = lg.fit(x_train_sklearn, y_train_sklearn)
# predict on training set
y_pred_train = lg.predict(x_train_sklearn)
print("Training set performance:")
print("Accuracy:", accuracy_score(y_train_sklearn, y_pred_train))
print("Precision:", precision_score(y_train_sklearn, y_pred_train))
print("Recall:", recall_score(y_train_sklearn, y_pred_train))
print("F1:", f1_score(y_train_sklearn, y_pred_train))
Training set performance: Accuracy: 0.7927054323390262 Precision: 0.7317188422917897 Recall: 0.6132066132066132 F1: 0.6672411935796617
# predict on the test set
y_pred_test = lg.predict(x_test)
print("Test set performance:")
print("Accuracy:", accuracy_score(y_test, y_pred_test))
print("Precision:", precision_score(y_test, y_pred_test))
print("Recall:", recall_score(y_test, y_pred_test))
print("F1:", f1_score(y_test, y_pred_test))
Test set performance: Accuracy: 0.7905738667501762 Precision: 0.7353507565337002 Recall: 0.6094391244870041 F1: 0.6665004363545692
X = df1.drop(["booking_status"], axis=1)
Y = df1["booking_status"]
X.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 |
X = pd.get_dummies(X, drop_first=True)
X.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | 0 | 224 | 2017 | 10 | 2 | 0 | 0 | 0 | 65.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 0 | 2 | 3 | 0 | 5 | 2018 | 11 | 6 | 0 | 0 | 0 | 106.68 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 1 | 0 | 2 | 1 | 0 | 1 | 2018 | 2 | 28 | 0 | 0 | 0 | 60.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | 0 | 211 | 2018 | 5 | 20 | 0 | 0 | 0 | 100.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 3 | 0 | 0 | 3 | 0 | 277 | 2019 | 7 | 13 | 0 | 0 | 0 | 89.10 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
# adding constant
X = sm.add_constant(X)
# Splitting data in train and test set
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1
)
# fit the logistic regression model
logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(disp=False)
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29803
Model: Logit Df Residuals: 29775
Method: MLE Df Model: 27
Date: Fri, 17 Sep 2021 Pseudo R-squ.: 0.3293
Time: 22:47:37 Log-Likelihood: -12799.
converged: False LL-Null: -19083.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const 91.4613 68.734 1.331 0.183 -43.256 226.178
no_of_adults -0.0320 0.035 -0.904 0.366 -0.102 0.037
no_of_children 0.1129 0.048 2.375 0.018 0.020 0.206
no_of_weekend_nights 0.0337 0.018 1.874 0.061 -0.002 0.069
no_of_week_nights 0.0732 0.011 6.885 0.000 0.052 0.094
required_car_parking_space -1.5124 0.116 -13.007 0.000 -1.740 -1.285
lead_time 0.0168 0.000 62.116 0.000 0.016 0.017
arrival_year -0.0468 0.034 -1.373 0.170 -0.114 0.020
arrival_month -0.0416 0.007 -6.146 0.000 -0.055 -0.028
arrival_date -0.0029 0.002 -1.650 0.099 -0.006 0.001
repeated_guest -3.0309 0.594 -5.099 0.000 -4.196 -1.866
no_of_previous_cancellations 0.2225 0.096 2.307 0.021 0.033 0.412
no_of_previous_bookings_not_canceled -0.0098 0.053 -0.186 0.852 -0.113 0.093
avg_price_per_room 0.0168 0.001 25.746 0.000 0.015 0.018
no_of_special_requests -1.2915 0.024 -54.838 0.000 -1.338 -1.245
type_of_meal_plan_Meal Plan 2 -0.1580 0.080 -1.982 0.048 -0.314 -0.002
type_of_meal_plan_Meal Plan 3 2.3659 4.29e+04 5.51e-05 1.000 -8.41e+04 8.42e+04
type_of_meal_plan_Not Selected 0.3612 0.043 8.413 0.000 0.277 0.445
room_type_reserved_Room_Type 2 -0.1851 0.127 -1.454 0.146 -0.435 0.064
room_type_reserved_Room_Type 3 0.3624 1.334 0.272 0.786 -2.251 2.976
room_type_reserved_Room_Type 4 -0.1271 0.044 -2.861 0.004 -0.214 -0.040
room_type_reserved_Room_Type 5 -0.2723 0.112 -2.422 0.015 -0.493 -0.052
room_type_reserved_Room_Type 6 -0.5080 0.120 -4.250 0.000 -0.742 -0.274
room_type_reserved_Room_Type 7 -0.7297 0.201 -3.634 0.000 -1.123 -0.336
market_segment_type_Complementary -20.6061 4771.099 -0.004 0.997 -9371.787 9330.575
market_segment_type_Corporate -0.3716 0.274 -1.357 0.175 -0.908 0.165
market_segment_type_Offline -1.9833 0.260 -7.616 0.000 -2.494 -1.473
market_segment_type_Online 0.2361 0.254 0.930 0.352 -0.262 0.734
========================================================================================================
print("Training performance:")
model_performance_statsmodel(lg, X_train, y_train)
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.792873 | 0.613801 | 0.731822 | 0.667636 |
There is a need to remove the multicollinearity, which effects the p-values.
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series))
Series before feature selection: const 1.859237e+07 no_of_adults 1.449389e+00 no_of_children 2.075427e+00 no_of_weekend_nights 1.083180e+00 no_of_week_nights 1.130019e+00 required_car_parking_space 1.035963e+00 lead_time 1.331931e+00 arrival_year 1.797321e+00 arrival_month 1.547549e+00 arrival_date 1.004898e+00 repeated_guest 2.016007e+00 no_of_previous_cancellations 1.509257e+00 no_of_previous_bookings_not_canceled 1.846098e+00 avg_price_per_room 2.625662e+00 no_of_special_requests 1.111299e+00 type_of_meal_plan_Meal Plan 2 1.098579e+00 type_of_meal_plan_Meal Plan 3 1.027720e+00 type_of_meal_plan_Not Selected 1.332580e+00 room_type_reserved_Room_Type 2 1.095587e+00 room_type_reserved_Room_Type 3 1.001228e+00 room_type_reserved_Room_Type 4 1.425579e+00 room_type_reserved_Room_Type 5 1.116252e+00 room_type_reserved_Room_Type 6 2.103918e+00 room_type_reserved_Room_Type 7 1.167633e+00 market_segment_type_Complementary 3.942072e+00 market_segment_type_Corporate 1.073452e+01 market_segment_type_Offline 2.744181e+01 market_segment_type_Online 3.647704e+01 dtype: float64
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29803
Model: Logit Df Residuals: 29775
Method: MLE Df Model: 27
Date: Fri, 17 Sep 2021 Pseudo R-squ.: 0.3293
Time: 22:47:39 Log-Likelihood: -12799.
converged: False LL-Null: -19083.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const 91.4613 68.734 1.331 0.183 -43.256 226.178
no_of_adults -0.0320 0.035 -0.904 0.366 -0.102 0.037
no_of_children 0.1129 0.048 2.375 0.018 0.020 0.206
no_of_weekend_nights 0.0337 0.018 1.874 0.061 -0.002 0.069
no_of_week_nights 0.0732 0.011 6.885 0.000 0.052 0.094
required_car_parking_space -1.5124 0.116 -13.007 0.000 -1.740 -1.285
lead_time 0.0168 0.000 62.116 0.000 0.016 0.017
arrival_year -0.0468 0.034 -1.373 0.170 -0.114 0.020
arrival_month -0.0416 0.007 -6.146 0.000 -0.055 -0.028
arrival_date -0.0029 0.002 -1.650 0.099 -0.006 0.001
repeated_guest -3.0309 0.594 -5.099 0.000 -4.196 -1.866
no_of_previous_cancellations 0.2225 0.096 2.307 0.021 0.033 0.412
no_of_previous_bookings_not_canceled -0.0098 0.053 -0.186 0.852 -0.113 0.093
avg_price_per_room 0.0168 0.001 25.746 0.000 0.015 0.018
no_of_special_requests -1.2915 0.024 -54.838 0.000 -1.338 -1.245
type_of_meal_plan_Meal Plan 2 -0.1580 0.080 -1.982 0.048 -0.314 -0.002
type_of_meal_plan_Meal Plan 3 2.3659 4.29e+04 5.51e-05 1.000 -8.41e+04 8.42e+04
type_of_meal_plan_Not Selected 0.3612 0.043 8.413 0.000 0.277 0.445
room_type_reserved_Room_Type 2 -0.1851 0.127 -1.454 0.146 -0.435 0.064
room_type_reserved_Room_Type 3 0.3624 1.334 0.272 0.786 -2.251 2.976
room_type_reserved_Room_Type 4 -0.1271 0.044 -2.861 0.004 -0.214 -0.040
room_type_reserved_Room_Type 5 -0.2723 0.112 -2.422 0.015 -0.493 -0.052
room_type_reserved_Room_Type 6 -0.5080 0.120 -4.250 0.000 -0.742 -0.274
room_type_reserved_Room_Type 7 -0.7297 0.201 -3.634 0.000 -1.123 -0.336
market_segment_type_Complementary -20.6061 4771.099 -0.004 0.997 -9371.787 9330.575
market_segment_type_Corporate -0.3716 0.274 -1.357 0.175 -0.908 0.165
market_segment_type_Offline -1.9833 0.260 -7.616 0.000 -2.494 -1.473
market_segment_type_Online 0.2361 0.254 0.930 0.352 -0.262 0.734
========================================================================================================
Some of the variables have p-value > 0.05. These variables can be dropped.
#Since, p-value for market_segment_type_Complementary >0.05
X_train1 = X_train.drop(
["market_segment_type_Complementary"], axis=1
)
logit1 = sm.Logit(y_train, X_train1.astype(float))
lg1 = logit1.fit()
print(lg1.summary())
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.429843
Iterations: 35
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29803
Model: Logit Df Residuals: 29776
Method: MLE Df Model: 26
Date: Fri, 17 Sep 2021 Pseudo R-squ.: 0.3287
Time: 22:47:40 Log-Likelihood: -12811.
converged: False LL-Null: -19083.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const 94.0777 68.671 1.370 0.171 -40.515 228.671
no_of_adults -0.0382 0.035 -1.081 0.280 -0.107 0.031
no_of_children 0.1097 0.048 2.309 0.021 0.017 0.203
no_of_weekend_nights 0.0348 0.018 1.936 0.053 -0.000 0.070
no_of_week_nights 0.0747 0.011 7.026 0.000 0.054 0.096
required_car_parking_space -1.5122 0.116 -13.010 0.000 -1.740 -1.284
lead_time 0.0168 0.000 62.150 0.000 0.016 0.017
arrival_year -0.0483 0.034 -1.419 0.156 -0.115 0.018
arrival_month -0.0425 0.007 -6.278 0.000 -0.056 -0.029
arrival_date -0.0029 0.002 -1.669 0.095 -0.006 0.001
repeated_guest -2.9906 0.596 -5.016 0.000 -4.159 -1.822
no_of_previous_cancellations 0.2198 0.096 2.280 0.023 0.031 0.409
no_of_previous_bookings_not_canceled -0.0110 0.053 -0.206 0.836 -0.115 0.093
avg_price_per_room 0.0170 0.001 26.230 0.000 0.016 0.018
no_of_special_requests -1.2918 0.024 -54.878 0.000 -1.338 -1.246
type_of_meal_plan_Meal Plan 2 -0.1694 0.080 -2.126 0.034 -0.326 -0.013
type_of_meal_plan_Meal Plan 3 -13.1988 4430.434 -0.003 0.998 -8696.689 8670.291
type_of_meal_plan_Not Selected 0.3641 0.043 8.483 0.000 0.280 0.448
room_type_reserved_Room_Type 2 -0.1828 0.127 -1.437 0.151 -0.432 0.067
room_type_reserved_Room_Type 3 0.2969 1.291 0.230 0.818 -2.234 2.828
room_type_reserved_Room_Type 4 -0.1269 0.044 -2.857 0.004 -0.214 -0.040
room_type_reserved_Room_Type 5 -0.2810 0.112 -2.502 0.012 -0.501 -0.061
room_type_reserved_Room_Type 6 -0.5183 0.119 -4.338 0.000 -0.752 -0.284
room_type_reserved_Room_Type 7 -0.7480 0.201 -3.727 0.000 -1.141 -0.355
market_segment_type_Corporate 0.1044 0.269 0.388 0.698 -0.424 0.632
market_segment_type_Offline -1.5005 0.255 -5.884 0.000 -2.000 -1.001
market_segment_type_Online 0.7117 0.249 2.856 0.004 0.223 1.200
========================================================================================================
There are still variables whose p-value >0.05. Running a loop to drop variables with higher p-value.
# running a loop to drop variables with high p-value
# initial list of columns
cols = X_train1.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
X_train_aux = X_train1[cols]
# fitting the model
model = sm.Logit(y_train, X_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['no_of_children', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'repeated_guest', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Offline', 'market_segment_type_Online']
X_train2 = X_train1[selected_features]
logit2 = sm.Logit(y_train, X_train2.astype(float))
lg2 = logit2.fit(disp=False)
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29803
Model: Logit Df Residuals: 29786
Method: MLE Df Model: 16
Date: Fri, 17 Sep 2021 Pseudo R-squ.: 0.3283
Time: 22:47:43 Log-Likelihood: -12818.
converged: True LL-Null: -19083.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
no_of_children 0.1040 0.046 2.280 0.023 0.015 0.193
no_of_week_nights 0.0781 0.010 7.531 0.000 0.058 0.098
required_car_parking_space -1.5107 0.116 -13.008 0.000 -1.738 -1.283
lead_time 0.0166 0.000 66.053 0.000 0.016 0.017
arrival_year -0.0017 5.83e-05 -28.439 0.000 -0.002 -0.002
arrival_month -0.0365 0.006 -6.511 0.000 -0.047 -0.025
repeated_guest -3.0469 0.552 -5.517 0.000 -4.129 -1.964
no_of_previous_cancellations 0.2178 0.095 2.298 0.022 0.032 0.403
avg_price_per_room 0.0163 0.001 29.189 0.000 0.015 0.017
no_of_special_requests -1.2929 0.024 -54.990 0.000 -1.339 -1.247
type_of_meal_plan_Not Selected 0.3544 0.041 8.582 0.000 0.273 0.435
room_type_reserved_Room_Type 4 -0.1252 0.043 -2.910 0.004 -0.210 -0.041
room_type_reserved_Room_Type 5 -0.2865 0.112 -2.558 0.011 -0.506 -0.067
room_type_reserved_Room_Type 6 -0.4636 0.116 -3.983 0.000 -0.692 -0.235
room_type_reserved_Room_Type 7 -0.6971 0.198 -3.518 0.000 -1.086 -0.309
market_segment_type_Offline -1.6072 0.119 -13.562 0.000 -1.840 -1.375
market_segment_type_Online 0.6216 0.106 5.872 0.000 0.414 0.829
==================================================================================================
All the above variables have p-value <0.05
# converting coefficients to odds
odds = np.exp(lg2.params)
# finding the percentage change
perc_change_odds = (np.exp(lg2.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train2.columns).T
| no_of_children | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | repeated_guest | no_of_previous_cancellations | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 1.109639 | 1.081199 | 0.220752 | 1.016761 | 0.998344 | 0.964202 | 0.047508 | 1.243277 | 1.016399 | 0.274469 | 1.425309 | 0.882285 | 0.750884 | 0.629003 | 0.498018 | 0.200443 | 1.861920 |
| Change_odd% | 10.963850 | 8.119948 | -77.924848 | 1.676082 | -0.165559 | -3.579829 | -95.249228 | 24.327738 | 1.639950 | -72.553051 | 42.530875 | -11.771462 | -24.911557 | -37.099658 | -50.198167 | -79.955668 | 86.192036 |
# creating confusion matrix
confusion_matrix_for_statsmodel(lg2, X_train2, y_train)
log_reg_model_train_perf = model_performance_statsmodel(
lg2, X_train2, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79341 | 0.613306 | 0.733483 | 0.668033 |
logit_roc_auc_train = roc_auc_score(y_train, lg2.predict(X_train2))
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
Lets check if the f1 score values can be improved. This can be done by changing the model threshold value according to the AUC-ROC curve
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.3068474177339848
# creating confusion matrix
confusion_matrix_for_statsmodel(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_statsmodel(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.772506 | 0.806752 | 0.627957 | 0.706214 |
Model Performance has significantly improved. Recall increased from 0.613306 to 0.806 Precession and Accuracy decreased.
y_scores = lg2.predict(X_train2)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.40
# creating confusion matrix
confusion_matrix_for_statsmodel(lg2, X_train2, y_train, threshold=optimal_threshold_curve)
log_reg_model_train_perf_threshold_curve = model_performance_statsmodel(
lg2, X_train2, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.788109 | 0.714088 | 0.677914 | 0.695531 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.307 Threshold",
"Logistic Regression-0.40 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.307 Threshold | Logistic Regression-0.40 Threshold | |
|---|---|---|---|
| Accuracy | 0.793410 | 0.772506 | 0.788109 |
| Recall | 0.613306 | 0.806752 | 0.714088 |
| Precision | 0.733483 | 0.627957 | 0.677914 |
| F1 | 0.668033 | 0.706214 | 0.695531 |
X_test2 = X_test[list(X_train2.columns)].astype(float)
# creating confusion matrix
confusion_matrix_for_statsmodel(lg2, X_test2, y_test)
log_reg_model_test_perf = model_performance_statsmodel(
lg2, X_test2, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79073 | 0.610123 | 0.735367 | 0.666916 |
logit_roc_auc_train = roc_auc_score(y_test, lg2.predict(X_test2))
fpr, tpr, thresholds = roc_curve(y_test, lg2.predict(X_test2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# creating confusion matrix
confusion_matrix_for_statsmodel(lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_statsmodel(
lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.771549 | 0.808254 | 0.630559 | 0.708433 |
# creating confusion matrix
confusion_matrix_for_statsmodel(lg2, X_test2, y_test, threshold=optimal_threshold_curve)
log_reg_model_test_perf_threshold_curve = model_performance_statsmodel(
lg2, X_test2, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.791044 | 0.721158 | 0.686266 | 0.70328 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.306 Threshold",
"Logistic Regression-0.40 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.306 Threshold | Logistic Regression-0.40 Threshold | |
|---|---|---|---|
| Accuracy | 0.793410 | 0.772506 | 0.788109 |
| Recall | 0.613306 | 0.806752 | 0.714088 |
| Precision | 0.733483 | 0.627957 | 0.677914 |
| F1 | 0.668033 | 0.706214 | 0.695531 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.306 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.306 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.790730 | 0.771549 | 0.791044 |
| Recall | 0.610123 | 0.808254 | 0.721158 |
| Precision | 0.735367 | 0.630559 | 0.686266 |
| F1 | 0.666916 | 0.708433 | 0.703280 |
dTree = DecisionTreeClassifier(criterion = 'gini', random_state=1)
dTree.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)
print("Accuracy on training set : ",dTree.score(X_train, y_train))
print("Accuracy on test set : ",dTree.score(X_test, y_test))
Accuracy on training set : 0.9967117404288159 Accuracy on test set : 0.7893212244578408
Does extremely well on training data.
#Checking number of positives
y.sum(axis = 0)
14487
## Function to create confusion matrix
def make_confusion_matrix_for_model(model,y_actual,labels=[1, 0]):
y_predict = model.predict(X_test)
cm=metrics.confusion_matrix( y_actual, y_predict, labels=[0, 1])
df_cm = pd.DataFrame(cm, index = [i for i in ["Actual - No","Actual - Yes"]],
columns = [i for i in ['Predicted - No','Predicted - Yes']])
group_counts = ["{0:0.0f}".format(value) for value in
cm.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cm.flatten()/np.sum(cm)]
labels = [f"{v1}\n{v2}" for v1, v2 in
zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=labels,fmt='')
plt.ylabel('True label')
plt.xlabel('Predicted label')
## Function to calculate recall score
def get_recall_value(model):
predict_train = model.predict(X_train)
predict_test = model.predict(X_test)
print("Recall on training set : ",metrics.recall_score(y_train,predict_train))
print("Recall on test set : ",metrics.recall_score(y_test,predict_test))
## Function to calculate recall score
def get_f1_value(model):
predict_train = model.predict(X_train)
predict_test = model.predict(X_test)
print("f1 on training set : ",metrics.f1_score(y_train,predict_train))
print("f1 on test set : ",metrics.f1_score(y_test,predict_test))
make_confusion_matrix_for_model(dTree,y_test)
# Accuracy on train and test
print("Accuracy value of training set : ",dTree.score(X_train, y_train))
print("Accuracy value of test set : ",dTree.score(X_test, y_test))
Accuracy value of training set : 0.9967117404288159 Accuracy value of test set : 0.7893212244578408
# Recall on train and test
get_recall_value(dTree)
Recall on training set : 0.9902979902979903 Recall on test set : 0.6981304149566804
# Recall on train and test
get_f1_value(dTree)
f1 on training set : 0.995125348189415 f1 on test set : 0.6947249007373795
feature_names = list(X.columns)
print(feature_names)
['const', 'no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'repeated_guest', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
#plt.figure(figsize=(20,30))
#tree.plot_tree(dTree,feature_names=feature_names,filled=True,fontsize=9,node_ids=True,class_names=True)
#plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(dTree,feature_names=feature_names,show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 91.50 | | | | |--- avg_price_per_room <= 209.28 | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | |--- avg_price_per_room <= 75.54 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- lead_time <= 17.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 17.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 67.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- lead_time > 67.50 | | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 15.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [46.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [87.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- lead_time <= 65.50 | | | | | | | | | | |--- weights: [249.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 65.50 | | | | | | | | | | |--- lead_time <= 66.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 66.50 | | | | | | | | | | | |--- weights: [32.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 75.54 | | | | | | | |--- avg_price_per_room <= 75.62 | | | | | | | | |--- arrival_year <= 2018.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_year > 2018.00 | | | | | | | | | |--- lead_time <= 66.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 66.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 75.62 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- lead_time <= 44.50 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [116.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 44.50 | | | | | | | | | | |--- lead_time <= 70.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 70.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [464.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- lead_time <= 57.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- lead_time > 57.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | |--- arrival_month <= 4.50 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- arrival_month > 4.50 | | | | | | | |--- market_segment_type_Complementary <= 0.50 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- market_segment_type_Complementary > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 209.28 | | | | | |--- weights: [0.00, 6.00] class: 1 | | | |--- lead_time > 91.50 | | | | |--- avg_price_per_room <= 91.22 | | | | | |--- no_of_week_nights <= 8.50 | | | | | | |--- lead_time <= 144.50 | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- arrival_date <= 28.00 | | | | | | | | | | |--- avg_price_per_room <= 62.38 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 62.38 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 28.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | |--- avg_price_per_room <= 62.88 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 62.88 | | | | | | | | | | |--- avg_price_per_room <= 75.33 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- avg_price_per_room > 75.33 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 144.50 | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 8.50 | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | |--- avg_price_per_room > 91.22 | | | | | |--- avg_price_per_room <= 96.61 | | | | | | |--- arrival_date <= 6.50 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- arrival_date > 6.50 | | | | | | | |--- lead_time <= 145.50 | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 141.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 141.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | |--- lead_time > 145.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- avg_price_per_room > 96.61 | | | | | | |--- avg_price_per_room <= 184.25 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- lead_time <= 93.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time > 93.50 | | | | | | | | | |--- avg_price_per_room <= 130.25 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- avg_price_per_room > 130.25 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- avg_price_per_room <= 116.19 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- weights: [40.00, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- lead_time <= 114.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 114.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 116.19 | | | | | | | | | |--- lead_time <= 108.50 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 108.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 184.25 | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- avg_price_per_room <= 202.67 | | | | | |--- lead_time <= 2.50 | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [100.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- avg_price_per_room <= 77.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 77.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [96.00, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 2.50 | | | | | | |--- arrival_month <= 1.50 | | | | | | | |--- weights: [79.00, 0.00] class: 0 | | | | | | |--- arrival_month > 1.50 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 82.97 | | | | | | | | | | |--- avg_price_per_room <= 70.24 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 70.24 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 82.97 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [32.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [66.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 202.67 | | | | | |--- arrival_month <= 1.50 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- arrival_month > 1.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 50.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time > 9.50 | | | | |--- avg_price_per_room <= 194.83 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- lead_time <= 24.50 | | | | | | | | | |--- weights: [107.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 24.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- avg_price_per_room <= 123.75 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 123.75 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 24.50 | | | | | | | | | | |--- weights: [98.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 24.50 | | | | | | | | | | |--- avg_price_per_room <= 100.00 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 100.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- avg_price_per_room <= 74.62 | | | | | | | | |--- avg_price_per_room <= 24.85 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 24.85 | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | |--- avg_price_per_room <= 67.31 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- avg_price_per_room > 67.31 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- avg_price_per_room > 74.62 | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 68.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 68.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- avg_price_per_room <= 111.69 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | | |--- avg_price_per_room > 111.69 | | | | | | | | | | | |--- truncated branch of depth 25 | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- lead_time <= 37.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 37.50 | | | | | | | | | | | |--- truncated branch of depth 28 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 111.80 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- avg_price_per_room > 111.80 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [93.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 194.83 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 200.35 | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | |--- lead_time <= 143.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 143.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 200.35 | | | | | | | |--- weights: [0.00, 263.00] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 7.50 | | | | |--- lead_time <= 4.50 | | | | | |--- no_of_week_nights <= 11.00 | | | | | | |--- avg_price_per_room <= 125.58 | | | | | | | |--- no_of_children <= 1.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- weights: [304.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 86.72 | | | | | | | | | | | |--- weights: [135.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 86.72 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 81.40 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 81.40 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | |--- no_of_children > 1.50 | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 125.58 | | | | | | | |--- lead_time <= 0.50 | | | | | | | | |--- avg_price_per_room <= 181.17 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- avg_price_per_room > 181.17 | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- lead_time > 0.50 | | | | | | | | |--- avg_price_per_room <= 126.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 126.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 11.00 | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- lead_time > 4.50 | | | | | |--- avg_price_per_room <= 108.99 | | | | | | |--- no_of_week_nights <= 12.50 | | | | | | | |--- no_of_adults <= 0.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_adults > 0.50 | | | | | | | | |--- avg_price_per_room <= 77.88 | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | |--- weights: [77.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 77.88 | | | | | | | | | |--- avg_price_per_room <= 78.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 78.50 | | | | | | | | | | |--- avg_price_per_room <= 98.05 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 98.05 | | | | | | | | | | | |--- weights: [48.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 12.50 | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 108.99 | | | | | | |--- avg_price_per_room <= 109.25 | | | | | | | |--- arrival_date <= 6.00 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- arrival_date > 6.00 | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- avg_price_per_room > 109.25 | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | |--- avg_price_per_room <= 138.79 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 138.79 | | | | | | | | | |--- avg_price_per_room <= 165.88 | | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- avg_price_per_room > 165.88 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | |--- lead_time > 7.50 | | | | |--- avg_price_per_room <= 121.78 | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | |--- lead_time <= 91.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 109.75 | | | | | | | | | |--- weights: [434.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 109.75 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [30.00, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- lead_time <= 25.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 25.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- avg_price_per_room <= 99.00 | | | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 99.00 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | |--- lead_time > 91.50 | | | | | | | |--- avg_price_per_room <= 120.50 | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | |--- lead_time <= 92.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 92.50 | | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- lead_time <= 135.50 | | | | | | | | | | | |--- weights: [52.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 135.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- avg_price_per_room > 120.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- market_segment_type_Online > 0.50 | | | | | | |--- lead_time <= 68.50 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [241.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [195.00, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_date > 10.50 | | | | | | | | | | |--- lead_time <= 46.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 46.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- lead_time <= 53.00 | | | | | | | | | | | |--- weights: [0.00, 16.00] class: 1 | | | | | | | | | | |--- lead_time > 53.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time > 68.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- avg_price_per_room <= 63.45 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 63.45 | | | | | | | | | | |--- lead_time <= 123.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 123.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- avg_price_per_room <= 87.38 | | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- avg_price_per_room > 87.38 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- avg_price_per_room <= 71.19 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 55.82 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 55.82 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 71.19 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | |--- avg_price_per_room > 121.78 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 192.17 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- lead_time <= 27.50 | | | | | | | | | |--- avg_price_per_room <= 149.67 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 149.67 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- lead_time > 27.50 | | | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | |--- lead_time <= 27.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 27.50 | | | | | | | | | | |--- avg_price_per_room <= 158.85 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 158.85 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- lead_time <= 58.00 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 58.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | |--- avg_price_per_room > 192.17 | | | | | | | |--- lead_time <= 87.00 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- lead_time <= 21.50 | | | | | | | | | | |--- avg_price_per_room <= 311.40 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 311.40 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 21.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- lead_time > 87.00 | | | | | | | | |--- avg_price_per_room <= 200.40 | | | | | | | | | |--- avg_price_per_room <= 198.49 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 198.49 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 200.40 | | | | | | | | | |--- lead_time <= 92.00 | | | | | | | | | | |--- arrival_date <= 20.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 20.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 92.00 | | | | | | | | | | |--- weights: [0.00, 25.00] class: 1 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [134.00, 0.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- lead_time <= 89.50 | | | | | | |--- weights: [2946.00, 0.00] class: 0 | | | | | |--- lead_time > 89.50 | | | | | | |--- no_of_children <= 0.50 | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | |--- no_of_children > 0.50 | | | | | | | |--- avg_price_per_room <= 121.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 121.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 6.50 | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | |--- weights: [50.00, 0.00] class: 0 | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- no_of_children <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_children > 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- lead_time > 6.50 | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- weights: [28.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- lead_time <= 19.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 19.00 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [108.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 3.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 3.00 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- avg_price_per_room <= 200.15 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- arrival_month <= 4.50 | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | |--- avg_price_per_room <= 103.44 | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 103.44 | | | | | | | | | |--- avg_price_per_room <= 151.65 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 151.65 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- arrival_date <= 25.00 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- arrival_date > 25.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 73.04 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 73.04 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- arrival_month > 4.50 | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- arrival_date <= 21.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 21.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- no_of_week_nights <= 9.00 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- no_of_week_nights > 9.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- arrival_month > 10.50 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- avg_price_per_room <= 138.40 | | | | | | | | | | |--- lead_time <= 99.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 99.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- avg_price_per_room > 138.40 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [172.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 200.15 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- weights: [0.00, 25.00] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [5.00, 0.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 100.04 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 274.00 | | | | |--- avg_price_per_room <= 89.55 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- avg_price_per_room <= 27.07 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- avg_price_per_room > 27.07 | | | | | | | |--- no_of_previous_cancellations <= 8.00 | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | |--- lead_time <= 205.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 205.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_previous_cancellations > 8.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- avg_price_per_room > 89.55 | | | | | |--- lead_time <= 223.00 | | | | | | |--- arrival_month <= 6.50 | | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | | |--- lead_time <= 183.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 183.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | |--- arrival_month > 6.50 | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | |--- lead_time <= 155.50 | | | | | | | | | |--- lead_time <= 154.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 154.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 155.50 | | | | | | | | | |--- lead_time <= 195.50 | | | | | | | | | | |--- weights: [26.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 195.50 | | | | | | | | | | |--- lead_time <= 201.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 201.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 223.00 | | | | | | |--- avg_price_per_room <= 94.00 | | | | | | | |--- avg_price_per_room <= 90.77 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- lead_time <= 243.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 243.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- avg_price_per_room > 90.77 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 94.00 | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time > 274.00 | | | | |--- lead_time <= 389.50 | | | | | |--- avg_price_per_room <= 84.38 | | | | | | |--- lead_time <= 277.50 | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- lead_time > 277.50 | | | | | | | |--- avg_price_per_room <= 69.44 | | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 24.00 | | | | | | | | | |--- lead_time <= 306.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 306.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 69.44 | | | | | | | | |--- lead_time <= 297.00 | | | | | | | | | |--- avg_price_per_room <= 74.25 | | | | | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 74.25 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 297.00 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 5.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- no_of_week_nights > 5.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 84.38 | | | | | | |--- lead_time <= 319.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- lead_time > 319.50 | | | | | | | |--- lead_time <= 345.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- lead_time > 345.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 1.00 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- lead_time > 389.50 | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | |--- weights: [0.00, 22.00] class: 1 | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | |--- lead_time <= 414.00 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- lead_time > 414.00 | | | | | | | |--- avg_price_per_room <= 22.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 22.50 | | | | | | | | |--- avg_price_per_room <= 59.25 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 59.25 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- avg_price_per_room <= 1.50 | | | | | |--- lead_time <= 285.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- lead_time > 285.50 | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | |--- avg_price_per_room > 1.50 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 558.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- lead_time <= 214.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 214.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- lead_time <= 223.50 | | | | | | | | | |--- lead_time <= 212.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- lead_time > 212.50 | | | | | | | | | | |--- arrival_date <= 18.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 18.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 223.50 | | | | | | | | | |--- weights: [0.00, 39.00] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- lead_time <= 164.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 164.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- lead_time <= 180.50 | | | | | | |--- arrival_month <= 2.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- arrival_month > 2.50 | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- arrival_date <= 17.00 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_date > 17.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | |--- lead_time <= 178.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 178.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- arrival_date > 22.50 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- lead_time > 180.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- avg_price_per_room <= 36.16 | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 36.16 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- no_of_week_nights <= 2.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 2.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- weights: [0.00, 191.00] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 276.50 | | | | | | | | | | |--- lead_time <= 221.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 221.50 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 276.50 | | | | | | | | | | |--- avg_price_per_room <= 75.28 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 75.28 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- lead_time <= 328.00 | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | |--- lead_time <= 159.50 | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 159.50 | | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | | |--- avg_price_per_room <= 96.07 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 96.07 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | | |--- lead_time <= 165.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 165.00 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | |--- arrival_month > 4.50 | | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | | |--- lead_time <= 216.50 | | | | | | | | | | |--- avg_price_per_room <= 96.46 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- avg_price_per_room > 96.46 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- lead_time > 216.50 | | | | | | | | | | |--- lead_time <= 289.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time > 289.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- lead_time > 328.00 | | | | | | | |--- avg_price_per_room <= 90.08 | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | |--- avg_price_per_room > 90.08 | | | | | | | | |--- arrival_month <= 6.00 | | | | | | | | | |--- lead_time <= 341.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 341.00 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 6.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [46.00, 0.00] class: 0 | |--- avg_price_per_room > 100.04 | | |--- no_of_special_requests <= 2.50 | | | |--- arrival_month <= 11.50 | | | | |--- arrival_month <= 1.50 | | | | | |--- no_of_special_requests <= 0.50 | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | |--- no_of_special_requests > 0.50 | | | | | | |--- lead_time <= 204.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- lead_time > 204.50 | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | |--- arrival_month > 1.50 | | | | | |--- weights: [0.00, 2688.00] class: 1 | | | |--- arrival_month > 11.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- no_of_special_requests <= 1.50 | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | |--- avg_price_per_room <= 145.22 | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | |--- avg_price_per_room > 145.22 | | | | | | | | |--- avg_price_per_room <= 147.19 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 147.19 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- no_of_special_requests > 1.50 | | | | | | |--- avg_price_per_room <= 146.39 | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 28.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- avg_price_per_room > 146.39 | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | |--- no_of_special_requests > 2.50 | | | |--- weights: [142.00, 0.00] class: 0
importance_of_attributes = pd.DataFrame(dTree.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False)
importance_of_attributes
| Imp | |
|---|---|
| lead_time | 0.344685 |
| avg_price_per_room | 0.158624 |
| no_of_special_requests | 0.097710 |
| arrival_date | 0.091935 |
| market_segment_type_Online | 0.077835 |
| arrival_month | 0.063383 |
| no_of_week_nights | 0.054088 |
| no_of_weekend_nights | 0.032010 |
| no_of_adults | 0.018499 |
| arrival_year | 0.016955 |
| type_of_meal_plan_Not Selected | 0.009919 |
| room_type_reserved_Room_Type 4 | 0.008643 |
| required_car_parking_space | 0.007496 |
| no_of_children | 0.006298 |
| type_of_meal_plan_Meal Plan 2 | 0.003403 |
| room_type_reserved_Room_Type 5 | 0.002953 |
| room_type_reserved_Room_Type 2 | 0.001715 |
| room_type_reserved_Room_Type 6 | 0.001426 |
| market_segment_type_Corporate | 0.000689 |
| market_segment_type_Offline | 0.000503 |
| repeated_guest | 0.000492 |
| room_type_reserved_Room_Type 7 | 0.000277 |
| no_of_previous_bookings_not_canceled | 0.000185 |
| market_segment_type_Complementary | 0.000129 |
| no_of_previous_cancellations | 0.000122 |
| room_type_reserved_Room_Type 3 | 0.000025 |
| type_of_meal_plan_Meal Plan 3 | 0.000000 |
| const | 0.000000 |
importances = dTree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='green', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
As per the decision tree model, lead_time is the important variable for predicting
d_Tree = DecisionTreeClassifier(criterion = 'gini',max_depth=3,random_state=1)
d_Tree.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=3, random_state=1)
make_confusion_matrix_for_model(d_Tree, y_test)
# Accuracy on train and test
print("Accuracy value of training set : ",d_Tree.score(X_train, y_train))
print("Accuracy value of test set : ",d_Tree.score(X_test, y_test))
Accuracy value of training set : 0.7710968694426735 Accuracy value of test set : 0.7707664605026228
get_recall_value(d_Tree)
Recall on training set : 0.7480447480447481 Recall on test set : 0.7549019607843137
get_f1_value(d_Tree)
f1 on training set : 0.6889760189659889 f1 on test set : 0.6934031413612565
plt.figure(figsize=(15,10))
tree.plot_tree(d_Tree,feature_names=feature_names,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
#Text Format
print(tree.export_text(d_Tree,feature_names=feature_names,show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [3328.00, 342.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [3649.00, 3792.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- weights: [7380.00, 1805.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- weights: [4141.00, 269.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 100.04 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [434.00, 129.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [575.00, 1047.00] class: 1 | |--- avg_price_per_room > 100.04 | | |--- no_of_special_requests <= 2.50 | | | |--- weights: [53.00, 2717.00] class: 1 | | |--- no_of_special_requests > 2.50 | | | |--- weights: [142.00, 0.00] class: 0
# importance of features of tree building
importance_of_the_features = pd.DataFrame(d_Tree.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False)
importance_of_the_features
| Imp | |
|---|---|
| lead_time | 0.486268 |
| market_segment_type_Online | 0.218481 |
| no_of_special_requests | 0.209991 |
| avg_price_per_room | 0.085260 |
| no_of_weekend_nights | 0.000000 |
| type_of_meal_plan_Meal Plan 3 | 0.000000 |
| market_segment_type_Offline | 0.000000 |
| market_segment_type_Corporate | 0.000000 |
| market_segment_type_Complementary | 0.000000 |
| room_type_reserved_Room_Type 7 | 0.000000 |
| room_type_reserved_Room_Type 6 | 0.000000 |
| room_type_reserved_Room_Type 5 | 0.000000 |
| room_type_reserved_Room_Type 4 | 0.000000 |
| room_type_reserved_Room_Type 3 | 0.000000 |
| room_type_reserved_Room_Type 2 | 0.000000 |
| type_of_meal_plan_Not Selected | 0.000000 |
| type_of_meal_plan_Meal Plan 2 | 0.000000 |
| no_of_week_nights | 0.000000 |
| no_of_adults | 0.000000 |
| no_of_previous_bookings_not_canceled | 0.000000 |
| no_of_previous_cancellations | 0.000000 |
| repeated_guest | 0.000000 |
| arrival_date | 0.000000 |
| arrival_month | 0.000000 |
| arrival_year | 0.000000 |
| no_of_children | 0.000000 |
| required_car_parking_space | 0.000000 |
| const | 0.000000 |
importances_of_d_Tree = d_Tree.feature_importances_
indices = np.argsort(importances_of_d_Tree)
plt.figure(figsize=(10,10))
plt.title('Feature Importances of d_Tree')
plt.barh(range(len(indices)), importances_of_d_Tree[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance of d_Tree')
plt.show()
# Choose the type of classifier.
estimator2 = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {'max_depth': np.arange(1,10),
'min_samples_leaf': [1, 2, 4, 7, 9, 10, 12],
"criterion": ["entropy", "gini"],
'min_impurity_decrease': [0.001,0.01,0.1]
}
# scoring used to compare parameter
acc_scorer_value = metrics.make_scorer(metrics.recall_score)
# grid search
grid_obj = GridSearchCV(estimator2, parameters, scoring=acc_scorer_value,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator2 = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator2.fit(X_train, y_train)
DecisionTreeClassifier(criterion='entropy', max_depth=3,
min_impurity_decrease=0.01, random_state=1)
make_confusion_matrix_for_model(estimator2,y_test)
# Accuracy on train and test
print("Accuracy on training set : ",estimator2.score(X_train, y_train))
print("Accuracy on test set : ",estimator2.score(X_test, y_test))
# Recall on train and test
get_recall_value(estimator2)
get_f1_value(estimator2)
Accuracy on training set : 0.7608630003690904 Accuracy on test set : 0.7609801925937525 Recall on training set : 0.7608157608157609 Recall on test set : 0.7701778385772914 f1 on training set : 0.6832022047384095 f1 on test set : 0.6887552247935569
plt.figure(figsize=(15,10))
tree.plot_tree(estimator2,feature_names=feature_names,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(estimator2,feature_names=feature_names,show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [3328.00, 342.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [3649.00, 3792.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- weights: [7380.00, 1805.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- weights: [4141.00, 269.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 100.04 | | |--- weights: [1009.00, 1176.00] class: 1 | |--- avg_price_per_room > 100.04 | | |--- no_of_special_requests <= 2.50 | | | |--- weights: [53.00, 2717.00] class: 1 | | |--- no_of_special_requests > 2.50 | | | |--- weights: [142.00, 0.00] class: 0
importance_estimator2 = (
pd.DataFrame(
estimator2.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
importance_estimator2
| Imp | |
|---|---|
| lead_time | 0.434355 |
| no_of_special_requests | 0.271789 |
| market_segment_type_Online | 0.190693 |
| avg_price_per_room | 0.103163 |
| no_of_weekend_nights | 0.000000 |
| type_of_meal_plan_Meal Plan 3 | 0.000000 |
| market_segment_type_Offline | 0.000000 |
| market_segment_type_Corporate | 0.000000 |
| market_segment_type_Complementary | 0.000000 |
| room_type_reserved_Room_Type 7 | 0.000000 |
| room_type_reserved_Room_Type 6 | 0.000000 |
| room_type_reserved_Room_Type 5 | 0.000000 |
| room_type_reserved_Room_Type 4 | 0.000000 |
| room_type_reserved_Room_Type 3 | 0.000000 |
| room_type_reserved_Room_Type 2 | 0.000000 |
| type_of_meal_plan_Not Selected | 0.000000 |
| type_of_meal_plan_Meal Plan 2 | 0.000000 |
| no_of_week_nights | 0.000000 |
| no_of_adults | 0.000000 |
| no_of_previous_bookings_not_canceled | 0.000000 |
| no_of_previous_cancellations | 0.000000 |
| repeated_guest | 0.000000 |
| arrival_date | 0.000000 |
| arrival_month | 0.000000 |
| arrival_year | 0.000000 |
| no_of_children | 0.000000 |
| required_car_parking_space | 0.000000 |
| const | 0.000000 |
importances = estimator2.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
dtree_tune_train_performance = model_performance_statsmodel(
estimator2, X_train, y_train
)
dtree_tune_train_performance
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.760863 | 0.760816 | 0.619958 | 0.683202 |
confusion_matrix_for_statsmodel(estimator2, X_train, y_train)
dtree_tune_test_performance = model_performance_statsmodel(
estimator2, X_test, y_test
)
dtree_tune_test_performance
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.76098 | 0.770178 | 0.622902 | 0.688755 |
confusion_matrix_for_statsmodel(estimator2, X_test, y_test)
dTree_classifier = DecisionTreeClassifier(random_state=1)
path = dTree_classifier.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000 | 0.003288 |
| 1 | 0.000000 | 0.003288 |
| 2 | 0.000000 | 0.003288 |
| 3 | 0.000000 | 0.003288 |
| 4 | 0.000000 | 0.003288 |
| ... | ... | ... |
| 2008 | 0.008721 | 0.299718 |
| 2009 | 0.012485 | 0.312203 |
| 2010 | 0.013059 | 0.325262 |
| 2011 | 0.024185 | 0.373632 |
| 2012 | 0.074478 | 0.448110 |
2013 rows × 2 columns
fig, bx = plt.subplots(figsize=(10,5))
bx.plot(ccp_alphas[:-1], impurities[:-1], marker='o', drawstyle="steps-post")
bx.set_xlabel("effective alpha")
bx.set_ylabel("total impurity - leaves")
bx.set_title("Total Impurity vs effective alpha for training set")
plt.show()
decision_tree_classifiers = []
for ccp_alpha in ccp_alphas:
dTree_classifier = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
dTree_classifier.fit(X_train, y_train)
decision_tree_classifiers.append(dTree_classifier)
print("Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
decision_tree_classifiers[-1].tree_.node_count, ccp_alphas[-1]))
Number of nodes in the last tree is: 1 with ccp_alpha: 0.074477957873341
decision_tree_classifiers = decision_tree_classifiers[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [dTree_classifier.tree_.node_count for dTree_classifier in decision_tree_classifiers]
depth = [dTree_classifier.tree_.max_depth for dTree_classifier in decision_tree_classifiers]
fig, bx = plt.subplots(2, 1,figsize=(10,7))
bx[0].plot(ccp_alphas, node_counts, marker='o', drawstyle="steps-post")
bx[0].set_xlabel("alpha")
bx[0].set_ylabel("number of nodes")
bx[0].set_title("Number of nodes vs alpha")
bx[1].plot(ccp_alphas, depth, marker='o', drawstyle="steps-post")
bx[1].set_xlabel("alpha")
bx[1].set_ylabel("depth of the tree")
bx[1].set_title("Depth vs alpha")
fig.tight_layout()
train_score_value = [dTree_classifier.score(X_train, y_train) for dTree_classifier in decision_tree_classifiers]
test_score_value = [dTree_classifier.score(X_test, y_test) for dTree_classifier in decision_tree_classifiers]
fig, cx = plt.subplots(figsize=(10,5))
cx.set_xlabel("alpha")
cx.set_ylabel("accuracy")
cx.set_title("Accuracy vs alpha for training and testing sets")
cx.plot(ccp_alphas, train_score_value, marker='o', label="train",
drawstyle="steps-post")
cx.plot(ccp_alphas, test_score_value, marker='o', label="test",
drawstyle="steps-post")
cx.legend()
plt.show()
best_model_head = np.argmax(test_score_value)
best_model = decision_tree_classifiers[best_model_head]
print(best_model)
print('Training accuracy of best model: ',best_model.score(X_train, y_train))
print('Test accuracy of best model: ',best_model.score(X_test, y_test))
DecisionTreeClassifier(ccp_alpha=0.00013438608720037193, random_state=1) Training accuracy of best model: 0.8453511391470657 Test accuracy of best model: 0.838409144288734
recall_train=[]
for dTree_classifier in decision_tree_classifiers:
pred_train_set=dTree_classifier.predict(X_train)
train_values=metrics.recall_score(y_train,pred_train_set)
recall_train.append(train_values)
recall_test=[]
for dTree_classifier in decision_tree_classifiers:
pred_test_set=dTree_classifier.predict(X_test)
test_values=metrics.recall_score(y_test,pred_test_set)
recall_test.append(test_values)
fig, dx = plt.subplots(figsize=(15,5))
dx.set_xlabel("alpha")
dx.set_ylabel("Recall")
dx.set_title("Recall vs alpha for training and testing sets")
dx.plot(ccp_alphas, recall_train, marker='o', label="train",
drawstyle="steps-post")
dx.plot(ccp_alphas, recall_test, marker='o', label="test",
drawstyle="steps-post")
dx.legend()
plt.show()
# creating the model which has highest train and test recall
best_model_1 = np.argmax(recall_test)
best_model_head1 = decision_tree_classifiers[best_model_1]
print(best_model_head1)
DecisionTreeClassifier(ccp_alpha=0.012485002220168545, random_state=1)
make_confusion_matrix_for_model(best_model_head1,y_test)
# Recall of best model on train and test
get_recall_value(best_model_head1)
Recall on training set : 0.7608157608157609 Recall on test set : 0.7701778385772914
get_f1_value(best_model_head1)
f1 on training set : 0.6789169132912232 f1 on test set : 0.68408262454435
plt.figure(figsize=(17,15))
tree.plot_tree(best_model_head1,feature_names=feature_names,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
# Text report
print(tree.export_text(best_model_head1,feature_names=feature_names,show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [3328.00, 342.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [3649.00, 3792.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- weights: [11521.00, 2074.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 100.04 | | |--- weights: [1009.00, 1176.00] class: 1 | |--- avg_price_per_room > 100.04 | | |--- weights: [195.00, 2717.00] class: 1
feature_importance_best_model = pd.DataFrame(best_model_head1.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False)
feature_importance_best_model
| Imp | |
|---|---|
| lead_time | 0.548006 |
| market_segment_type_Online | 0.210444 |
| no_of_special_requests | 0.145464 |
| avg_price_per_room | 0.096085 |
| no_of_weekend_nights | 0.000000 |
| type_of_meal_plan_Meal Plan 3 | 0.000000 |
| market_segment_type_Offline | 0.000000 |
| market_segment_type_Corporate | 0.000000 |
| market_segment_type_Complementary | 0.000000 |
| room_type_reserved_Room_Type 7 | 0.000000 |
| room_type_reserved_Room_Type 6 | 0.000000 |
| room_type_reserved_Room_Type 5 | 0.000000 |
| room_type_reserved_Room_Type 4 | 0.000000 |
| room_type_reserved_Room_Type 3 | 0.000000 |
| room_type_reserved_Room_Type 2 | 0.000000 |
| type_of_meal_plan_Not Selected | 0.000000 |
| type_of_meal_plan_Meal Plan 2 | 0.000000 |
| no_of_week_nights | 0.000000 |
| no_of_adults | 0.000000 |
| no_of_previous_bookings_not_canceled | 0.000000 |
| no_of_previous_cancellations | 0.000000 |
| repeated_guest | 0.000000 |
| arrival_date | 0.000000 |
| arrival_month | 0.000000 |
| arrival_year | 0.000000 |
| no_of_children | 0.000000 |
| required_car_parking_space | 0.000000 |
| const | 0.000000 |
importances_best_model = best_model_head1.feature_importances_
indices = np.argsort(importances_best_model)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances_best_model[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
comparison_frame_recall = pd.DataFrame({'Model':['Logistic Regression sklearn','Logistic Regression-0.306', 'Logistic Regression - 0.40','Decision Tree stats Model','Decision tree with restricted maximum depth','Decision treee with hyperparameter tuning',
'Decision tree with post-pruning'], 'Train_Recall':[0.61,0.81,0.71,0.76,0.748,0.76,0.76], 'Test_Recall':[0.61,0.81,0.71,0.77,0.745,0.77,0.77]})
comparison_frame_recall
| Model | Train_Recall | Test_Recall | |
|---|---|---|---|
| 0 | Logistic Regression sklearn | 0.610 | 0.610 |
| 1 | Logistic Regression-0.306 | 0.810 | 0.810 |
| 2 | Logistic Regression - 0.40 | 0.710 | 0.710 |
| 3 | Decision Tree stats Model | 0.760 | 0.770 |
| 4 | Decision tree with restricted maximum depth | 0.748 | 0.745 |
| 5 | Decision treee with hyperparameter tuning | 0.760 | 0.770 |
| 6 | Decision tree with post-pruning | 0.760 | 0.770 |
comparison_frame_accuracy = pd.DataFrame({'Model':['Logistic Regression sklearn','Logistic Regression-0.306', 'Logistic Regression - 0.40','Decision Tree stats Model','Decision tree with restricted maximum depth','Decision treee with hyperparameter tuning',
'Decision tree with post-pruning'], 'Train_accuracy':[0.79, 0.77,0.79,0.76,0.77, 0.76,0.85 ], 'Test_accuracy':[0.79,0.77,0.79,0.76,0.77,0.76,0.84]})
comparison_frame_accuracy
| Model | Train_accuracy | Test_accuracy | |
|---|---|---|---|
| 0 | Logistic Regression sklearn | 0.79 | 0.79 |
| 1 | Logistic Regression-0.306 | 0.77 | 0.77 |
| 2 | Logistic Regression - 0.40 | 0.79 | 0.79 |
| 3 | Decision Tree stats Model | 0.76 | 0.76 |
| 4 | Decision tree with restricted maximum depth | 0.77 | 0.77 |
| 5 | Decision treee with hyperparameter tuning | 0.76 | 0.76 |
| 6 | Decision tree with post-pruning | 0.85 | 0.84 |
comparison_frame_f1 = pd.DataFrame({'Model':['Logistic Regression sklearn','Logistic Regression-0.306', 'Logistic Regression - 0.40','Decision Tree stats Model','Decision tree with restricted maximum depth','Decision treee with hyperparameter tuning',
'Decision tree with post-pruning'], 'Train_f1':[0.67, 0.71,0.70,0.68,0.69,0.68,0.68 ], 'Test_f1':[0.67, 0.71, 0.70, 0.69,0.69, 0.69, 0.68]})
comparison_frame_f1
| Model | Train_f1 | Test_f1 | |
|---|---|---|---|
| 0 | Logistic Regression sklearn | 0.67 | 0.67 |
| 1 | Logistic Regression-0.306 | 0.71 | 0.71 |
| 2 | Logistic Regression - 0.40 | 0.70 | 0.70 |
| 3 | Decision Tree stats Model | 0.68 | 0.69 |
| 4 | Decision tree with restricted maximum depth | 0.69 | 0.69 |
| 5 | Decision treee with hyperparameter tuning | 0.68 | 0.69 |
| 6 | Decision tree with post-pruning | 0.68 | 0.68 |
Lead time, average price per room, no of special requests, market segment type online play an major role in cancellations.